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e Processing to ensure the best ML model 
o reduce a large input data set into relevant features 
o preserving the most relevant & non-redundant data 


e Feature Selection 
o choose subset of relevant features- most representative 
o exclude redundant, irrelevant data, noise, etc. 
o variable selection, attribute selection, or variable subset selection 


e Feature Extraction 
o extracting info from the original features set to create a new ones 
o initial raw data reduced to more manageable data for processing 





. Simplify the learning models 
o easier interpretation of the model 
o easier translation of results 


- Reduces processing time 
o translates to shorter training time 
o cheaper to optimize 


» Avoid the curse of dimensionality 
o excess dimensions (i.e. features) -> sparse data -> statistical insignificance 
o FS reduces the # of dimensions -> improving statistical significance 


- Reduce overfitting 
o improve accuracy or other performance indicator 
o increase the generalization potential of a classifier 


No Feature Selection vs. Feature Selection 
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iouis Regularization methods such as 
Im: LASSO, Ridge regression, Elastic 


Benefits of Feature Engineering 


. A Boost in training speed 

. An improvement in model accuracy 

. A reduction in risk of overfitting 

. Arise in model explainability 

. Enhance model computation efficiency 
. Reduce generalization error 

. Better data visualization 


Feature Selection 





Feature Selection Techniques 
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Remove Single Feature 


Run again with removing 
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- Bag-of-Words — NLP technique 


o extracts the words (features) 

o sentence, document, website, etc. & classify by frequency of use 
o target specific words for vocabulary of the learning model 

o also be applied to image processing 


. selecting the feature is more important than 
designing the prediction model 





ITERATION 1 ITERATION 2 ITERATION 3 
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RFE -select features by recursive consideration nS 
+ smaller and smaller sets of features en 
^ trains the model on the original # of features EN 
» importance is given to each feature = C 

the least important features are removed 
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the process is repeated to a specified # of features er 
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FEATURE EMILINATION 
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1. Start with 13-feature model 

2. Find predictor with smallest effect (correlation coefficient) 
3. Remove predictor and build/test new model (n-1) features 
4. N = n-1, repeat until n-1 = desired value 





Forward Selection 


Start with a model with no variables 


Null Model 


BEBBE 


Add the most significant variable 








Keep adding the most significant variable until reaching 
the stopping rule or running out of variables 






Model with 2 variables 
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Backward Selection 


Start with a model that contains all the 
variables 


Full Model 
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Remove the least significant variable 








Keep removing the least significant variable until 
reaching the stopping rule or running out of variables 
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Feature Extraction 


Feature Selection & Feature Extraction 


- both used for dimensionality reduction 

o feature selection algorithms maintains original features 
> used if the requirement is to maintain the original features 
> used when model explainability, generalizability 

o feature extraction algorithms transform the data 
> derive useful information from data 
> computationally expensive, overfitting 
> used when model explainability not requirement 


+ selecting the appropriate feature engineering 
o maybe more important than the prediction model design 





- Principle Components Analysis (PCA) 


- Independent Component Analysis (ICA) 
- Linear Discriminant Analysis (LDA) 
» Locally Linear Embedding (LLE) 


. t-distributed Stochastic Neighbor Embedding 
(t-SNE) 


Filter Methods 





Select the best TUER 
All features features Meier : 
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Wrapper Methods 
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Polynomial Features 





Types of Polynomials 


Linear ———— — —— ar + b — 0 
ar^ + bxr+c=0 
ax? + ba? + cx +d = 0 





Quadratic 


Cubic 








Regression Model 


Linear Regression 


Polynomia Regression 


SVR 


Decision Tree Regression 


Random Forest Regression 


Pros Cons 


Works on any Size of dataset, gives 
informations about relevance of features Te Unea Regression Assumptions 


Works on any size of dataset, works very Need to choose the right polynomia degree 
well on non linear problems tor agood bias/variance tradeoff 


Easily adaptable, works very well on non Compulsory to apply feature scaling, not 
linear problems, not biased by outliers well known, more difficult to understand 


Inter pretability, no need for feature scaling, Poor results on too small datasets, 
works on both linear /nonlinear problems overfitting can easil y occur 


Powerful and accurate, good performance No interpretability, overfitting can easily 
on many problems, including non linea. occur, need to choose the number of trees 
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Feature Extraction Supervised Feature Selection 


| Embedded (a ad ur | 
Filters j C S Wrappers 
Methods eg Boruta 
Algorithm 
Missing value ratio Correlation Regularization Filter & Wrapper 
. coefficient L1, L2, Elastic NET Methods Forward Feature 
Selection 
sen ahaclita ides ds - shorter training time Selection 
em e ETT 
Selection 
Selection 
FCBF Fast Mutual 
Correlation Filter Information Extended Relief RRELIEFF RRELIEFF 
Permutation Feature Relief 
Importance en. Extended ReliefF Tuned ReliefF Relieved-F 
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Reduction 














Appendix 


Feature Engineering Methods 


Genericset of methods which do 


not incorporate a specific 
machine learning algorithm. 


Much faster compared to 
Wrapper methods in terms of 
time complexity 

Less prone to over-fitting 


Examples — Correlation, Chi- 
Square test, ANOVA, 
Information gain etc. 


Evaluates on a specific machine 
learning algorithm to find 
optimalfeatures. 


High computation time for a 
dataset with many features 


High chances of over-fitting 
because it involvestraining of 
machine learning models with 
different combination of 
features 

Examples - Forward Selection, 
Backward elimination, Stepwise 
selection etc. 


Embeds (fix) features during 
model building process. Feature 
selection is done by observing 
each iteration of modeltraining 
phase. 

Sits betweenFilter methods and 
Wrapper methods in terms of 
time complexity 

Generally used to reduce over- 
fitting by penalizingthe 
coefficients of a modelbeingtoo 
large. 


Examples- LASSO, Elastic Net, 
Ridge Regression etc. 
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- Checklist 


-— IV Domain Knowledge 


— 
— Missing Values 
= [Missing 
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— 
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— ——————————— 


- - 6 Principal Component Analysis 
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